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Let X\/j, ~ N p (fi, v x I) and Y\fi ~ N v ([i,v y I) be independent p- 
dimensional multivariate normal vectors with common unknown mean 
/j,. Based on observing X = x, we consider the problem of estimating 
the true predictive density p(y\(i) of Y under expected Kullback- 
Leibler loss. Our focus here is the characterization of admissible pro- 
cedures for this problem. We show that the class of all generalized 
Bayes rules is a complete class, and that the easily interpretable con- 
ditions of Brown and Hwang [Statistical Decision Theory and Related 
Topics (1982) 77/ 205-230] are sufficient for a formal Bayes rule to 
be admissible. 



1. Introduction. Let X\fi ~ N p (fj,,v x I) and Y\fj, ~ N p (fi,v y I) be inde- 
pendent jj-dimensional multivariate normal vectors with a common unknown 
mean fj, 6 RP. We assume that v x > and v y > are known. We let p(x\fi) 
and p(y\fj) denote the conditional densities of X and Y , suppressing the 
dependence on v x and v y throughout. 

Based on observing only X = x, we consider the problem of estimating 
the density p(y\n) of Y. The natural action space Ao consists of all proper 
densities on R p , that is 

(1) A = ^g:R p ^R such that g(y) > and J p(y)dy = l|. 

For each observation x € R p , a (nonrandomized) decision procedure p(-\x) : 
RP — > Aq chooses a g £ Ao- 
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We measure the goodness of fit of g(y) to p(y\p) by Kullback-Leibler 
(KL) loss 

(2) L(p, <?) = (/ P{y ^ log dy ' if 9 ^ > a - e " 

[ oo, otherwise, 

and evaluate a procedure p(-\x) by its risk function 

(3) Rkl(p,P)= L(p,p(-\x))p(x\p)dx. 



For the comparison of two (nonrandomized) procedures, we say that p\ 
dominates p2 if Rkl{PiPi) < Rkl(p,P2) for all p and with strict inequality 
for some p. A procedure p(-\x) is called admissible if it cannot be dominated. 

Two widely-used methods to obtain predictive densities are "plug-in" 
rules and Bayes rules. A plug-in rule 

(4) p fi (y\x)=p(y\p = p(x)) 

simply substitutes an estimate p for p in p(y\p). In contrast, a Bayes rule in- 
tegrates p out with respect to a nonnegative and locally finite prior measure 
M to obtain 

When writing an expression such as (5), we implicitly assume that the de- 
nominator in the middle expression is finite for all x, and hence all terms 
in (5) are finite for all x. We use the symbol tt to denote the density of M 
when it exists, and will write either p n or pu in that case. 

Aitchison (1975) showed that for proper M, Pm(v\ x ) minimizes the aver- 
age KL risk 

(6) B KL (M,p) = J R KL (p,p)M(dp). 

Aitchison also showed that the (formal) Bayes rule (5) under the uniform 
prior density ftu{p) = 1 5 namely Pir u (y\x), dominates the plug-in rule p{y\pMUE)-> 
which substitutes the maximum likelihood estimate /Imle — x for p. Indeed, 
as will be seen in Section 3, all the admissible procedures for multivariate 
normal density prediction under KL loss are Bayes rules in the sense of (5). 

The constant risk Bayes rule p 7Tu is best invariant, minimax, admissible 
when p = 1 [Murray (1977), Ng (1980) and Liang and Barron (2004)], and as 
we shall show in Section 3, admissible whenp = 2. However, it is inadmissible 
when p > 3. This was first established by Komaki (2001) who showed that 
p nu is dominated by the Bayes rule under the (nonconstant) harmonic prior 
when p > 3. Liang (2002) further showed that p- KU is dominated by proper 
Bayes rules under Strawderman priors when p > 5. 
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It is interesting to note the parallels between our predictive density esti- 
mation problem and the problem of estimating a multivariate normal mean 
under quadratic loss. Based on observing Z\\i ~ N p (fj,,vl) with v known, 
this latter problem is to estimate f/, under quadratic risk 

(7) ^( / x,A) = ^||A-HI 2 ) 

where the dependence of Rq on v is indicated by the superscript v. Here the 
maximum likelihood estimator /xmlEj which is best invariant, minimax and 
admissible when p = 1 or 2, is dominated by the Bayes rule ft n = j fiir(/j,\x) dfj, 
under the harmonic prior when p > 3 [Stein (1974)] and by the proper Bayes 
rule under the Strawderman priors when p > 5 [Strawderman (1971)]. Note 
that in the KL risk problem p nu (y\x), rather than p(y\£tMLE), plays the same 
role as //mle i n the quadratic risk problem. Recall that /Imle can also be 
motivated as the Bayes rule under ttjj(^) = 1 in the quadratic risk problem. 

George, Liang and Xu (2006) recently drew out these parallels between 
the KL risk and quadratic risk problems, and found that they could be 
explained by connections between unbiased estimates of risk. These connec- 
tions were shown to yield analogous sufficient conditions for the minimaxity 
of Bayes rules in both problems. In this paper, we establish further parallels 
concerning the characterization of admissibility in both problems. As proper 
Bayes rules are easily shown to be admissible in the KL setting, see Section 
4.8.1 in Berger (1985), our focus will be on improper it under which p w (y\x) 
is sometimes more precisely called a formal or generalized Bayes rule. In 
Section 3, we establish sufficient conditions for the admissibility of Bayes 
rules p n (y\x) under KL loss, conditions analogous to those of Brown (1971) 
and Brown and Hwang (1982). In Section 3, we prove that all admissible 
procedures for the KL risk problems are Bayes rules, a direct parallel of the 
complete class theorem of Brown (1971) for quadratic risk. 

It might be of interest to note that when v y — > 0, p(y\fJ>) degenerates to a 
point mass I{y = [i} and that by (5), 

P7r(y\x) = J p(y\n)ir(n\x) dp -> n(y\x). 

Therefore, the limiting KL risk of a Bayes rule p^ is 

lim R KL (/j,,p n ) = En 

where the right-hand side can be viewed as the KL risk for "estimating 
a point mass at //" by a posterior density. Thus, our setup can provide a 
decision theoretic framework for evaluating a prior by the extent to which 
En log7r(/i|X) is large for all fi. 



!{y = lA l°g 



i{y = M 

Tr(y\X) _ 



-EplogirinlX), 
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2. Sufficient conditions for admissibility. For Z ~ N p (/j,,I), Brown (1971) 
and Brown and Hwang (1982) developed general sufficient conditions for the 
admissibility of formal Bayes rules for the quadratic risk problem. To utilize 
their results and obtain analogous sufficient conditions for the KL risk prob- 
lem, we first establish a relationship between KL risk and quadratic risk. In 
this section, we assume that the prior measure M has a density 7r and that 
Rkl{^,P-k) < oo for all fi £ R p . Let 

(8) rn n (z; v) = J p{z\h)-k(h) dfi 

be the marginal density of Z ~ N p (fi, vl) under ir. 

Theorem 1. Let ir be a prior density on (i such that m 7T (z;v x ) is finite 
for all z. Then 

Rkl{h,V-ku) - Rk^.P-k) 

(9) 

1 [ v * 1 

= o/ — Amle) -R Q {n,M\dv 

where v w = v x v y / (v x + v y ) < v x . 

Proof. Let m n (w ;v w ) denote the marginal density under tt of 
(10) W= VvX + VxY ~N p fr,v w I). 

V X +Vy 

By Lemmas 2 and 3 of George, Liang and Xu (2006), 

Rk^iP-ku) ~ RKL(^,Pn) 

(11) 

= E^ Vw \ogm w (W;v w ) - E^ Vx \ogm n (X;v x ), 
m 7r (z; v) is finite for any v w < v < v x , and 



(12) — E IJLjV logm 7r (Z;v) = E^ v 2 



ov ^ ^ \ ■ s Jm^\Z\v) 

where V 2 g(z) = J2 ~j^?9( z )i an d E^ v {-) stands for expectation with respect 
to the N(/i,vI) distribution. Furthermore, Stein (1974, 1981) showed that 
for the quadratic risk problem 



(13) ^(m,Amle) - R v q(^M = ~^ 2 E^ 



v /m n (Z;v) 

Combining (11), (12) and (13), the lemma follows. □ 
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Now let B KL (ir,p) = J R KL (fj,,p)-Tr(fj,) dfi and B^(n,p) = fR^(fi,fi) x 
7r(fj>)d(i be the average KL and quadratic risks over tt. The following re- 
lationship between the average KL risk difference and the average quadratic 
risk difference of Bayes rules follows from (9) and averaging over a prior 7r n 
that satisfies J Rp 7r n (/x) dfx < oo. 

Corollary 1. Let tt and tt u be priors on /i such that m n (z;v x ) and 
m nn (z;v x ) are finite for all z . Furthermore, assume ix n satisfies J Rp vr n (/i) djj, < 
co. Then 



(14) 



Skl^Ptt) - B kl (tt 



■ [Bq (tt„, /ivr) - B V Q (vr n , p^ n )) dv. 



Corollary 1 enables us to extend the approach of Brown and Hwang (1982) 
to establish conditions for the admissibility of formal Bayes rules in the KL 
risk problem. As in Brown and Hwang (1982), we use Blyth's method which 
can be extended to any statistical estimation problem with a strictly convex 
loss function [Brown (1971)]. 

Lemma 1. Let p be such that Rkl(h,P) < oo for all fj, G BP . If there ex- 
ists a sequence of densities {vr n } such that f RP ir n (fi) dfi < oo, J\\^\\<i ^n{^) dfi > 
c for some positive constant c, and 



(15) 

then p is admissible. 



BKh(^n,p) ~ B K h(TTn,P7T n ) -> 



Proof. Suppose p is not admissible. Then there is a p' such that R^^fi, 
p') < Rkl(^,p) with strict inequality for some /i. Let p" = (p + p')/2. Thus 



< 



p(x\n)p(y\n) 

p(x\n)p{y\n) 
p(x\n)p(y\n) 



log 



p"(y\x) 



dx dy 



logp(y|/i) - logQp(y|x) + ^p'(y\x) 
logp(y|/i) - -(logp(y|a;) + log p'(y\x)) 



dx dy 
dxdy 



^(Rkl{p,p) + Rkl(h,p)) < Rkl(p,p)- 



Since i?KL(/^ ; p) and Rkl(/^,p") are both continuous in fi, there exists an 
e > such that for all fj, S {fi: \\fi\\ < 1}, 

Rkl(v,P) - Rkl{^,p") > e > 0. 
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Therefore, we have 

-Bkl(vt„,j5) - B KL (-K n ,p nn ) > B KL (n n ,p) - B KL (*!r n ,p") > e • c> 
which contradicts (15). The admissibility of p follows. □ 

We assume without loss of generality that the coordinate system is chosen 
so that /y^i^i ^{l 1 ) dfi>c for some positive constant c. Using Lemma 1, we 
extend the approach of Brown and Hwang (1982) to obtain the following. 

Theorem 2. A formal Bayes rule p n is admissible under KL loss if for 
every v £ [v w ,v x ], the improper ir satisfies both: 

(i) the growth condition: 

TrQf) 

Irp-S ||ju|| 2 log 2 (||/i|| V2) 

where S = {fi : < 1} and a V b = max{o, b}, and 

(ii) the asymptotic flatness condition: 



( 16 ) /_ _ „,„,,, 2,1 „,,^ <00, 



(17) / / ttO*) 



m^ n (z;v) V7r 



m x (z;u) 7r 



2 



ci/ids < oo. 



Proof. For v = 1, Brown and Hwang showed that when the prior den- 
sity 7r satisfies the growth condition (16) and the asymptotic flatness condi- 
tion (17), there exists a sequence of densities {vr n ,} such that J1mi<i ^n{^) d/i = 
/imi<i ^{p) dfjL> c and that Bq =l (-K n: ft) — Sg =1 (7r n ,/i 7rn ) — ► 0. Furthermore, 
they showed that an explicit construction of such a sequence {7r n } is obtained 
by defining 



IHI<i. 

L < ||/x|| 



_log(|^ 5 l<|H|<n, 
log(n) 



for n = 2, 3, . . . , and letting 

(19) 7T n (/i)=J^(/i)7r(/i). 

It is straightforward to show that the above construction also works 
for general v. That is, for any v, if tt satisfies conditions (16) and (17), 
then for the sequence {vr n } obtained by (18) and (19), A n ^ = Bq{it n , — 
Bq(ir n , fi nn ) — > 0. It thus follows that if 7r satisfies conditions (16) and (17) 
for every v 6 Vx], then by Corollary 1 and by the continuity in v of A n>v , 

1 r * 1 

(20) B KL (TT n ,p w ) - 5KL(7Tn,P7r„) = o / ~ A n,t, du ^ °- 

That is admissible now follows immediately from Lemma 1. □ 
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Example 1 (Uniform prior). Let vr(/i) = 1 for any then V7r = 0. In 
this case, the conditions of Theorem 2 are easy to verify when p = 1 or 2. 
Therefore, the formal Bayes rule p nu is admissible when p = 1 or 2. 

It was pointed out in Brown and Hwang (1982) that if 

< IIHI 2 - p , 



(21) 



-o(||/i|| ) and 



a 2 vr(^) 



hen (16) is easy to check and (17) can be verified with some difficulty [extend- 
ing Lemma 3.4.1 of Brown (1971)]. Hence, by Theorem 2, the corresponding 
p n is admissible under KL loss. 

Example 2 (Harmonic prior). Let kh{h) = IMI _( ' P_2 ' ) for p > 3. Be- 
cause this prior satisfies (21), the formal Bayes rule p nH is admissible when 
p>3. 

The following corollary is similarly a straightforward extension from 
Brown and Hwang (1982). It replaces condition (17) of Theorem 2 with a 
condition that is slightly less general, but more transparent and easier to 
verify. 

Corollary 2. If an improper density tt satisfies (16) and 
(22) [^/d,<oo, 



7T(/Z) 

then the formal Bayes rule p n is admissible under KL loss. 

Finally, it was also pointed out in Brown and Hwang (1982) that if 
(23) nU) < M\ 2 - p - £ for some e>0 and = (|| /i ||- 1 ) J 

7T(/X) 

then (16) and (22) are easy to check. Hence, by Corollary 2, the correspond- 
ing p-x is admissible under KL loss. 

There have been a few treatments of related problems yielding admissi- 
bility results in the same spirit as the above. In particular, Eaton (1982) 
formulates a prediction problem similar to the above, but under an in- 
tegrated quadratic (£2) loss function, rather than our KL loss. Gatsonis 
(1984) discusses a related problem of estimating an unknown prior under 
this quadratic loss. Gatsonis proves an admissibility result in his setting for 
the Bayes procedure for the uniform prior. Gatsonis' methods do not easily 
apply to problems involving Bayes procedures for (generalized) priors other 
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than the uniform prior. Eaton [(1992), Section 6] considers a prediction prob- 
lem like ours, but with a different type of loss function. This loss function is 
bounded, and leads to a problem that is "quadratically regular" in a sense 
of that paper. For such quadratically regular problems the results of Eaton 
(1992) show admissibility for a specified class of prior measures. It is shown 
in Theorem 5.2 of Eaton et al. (2007) that Eaton's class of prior measures 
contains most of the densities covered by our Theorem 2, and vice- versa. 

3. A complete class theorem. We now turn to establishing that all (gen- 
eralized) Bayes rules form a complete class for the KL loss problem. In 
Section 3.1, we begin by first establishing properties of some modified ac- 
tion spaces and the KL loss function. We then make use of these properties 
in Section 3.2 where we prove our main complete class results. 

3.1. Preliminary lemmas. Because the true density p(y|/x) is bounded by 
a constant C = (2irv y )~ p ^ 2 for any fi, it will eventually be useful to restrict 
attention to bounded density estimates. Let 

(24) A = ^g:R p -> R such that 0< g(y) < C a.e. and Jg{y)dy = l^. 

Obviously, A is a subset of the action space Aq that is defined in (1). 

The following lemma, which is proved in the Appendix, shows that no 
admissible actions are lost by restricting the action space to A. 

Lemma 2. Suppose go(-) S Aq. If go ^ A, that is, go> C on a set S C R p 
with positive measure, then there exists a g € A that dominates go in the 
sense that L(fi,g ) > L(/i,g) for all fi. 

It will also be useful to consider extending A to its closure 

(25) A* = ^g:R p ^R such that < g(y) <C a.e. and Jg(y)dy<l\, 

and then to make use of the topological properties of .4*. Because A* is a 
subset of the Banach space , we will consider the topology on A* induced 
by the weak* topology on C^. Under this weak* topology, a sequence {gi} £ 
A* converges to a g G A* if 

(26) J f{y)gM dy^J f(y)g(y) dy V/ G £\. 

We will eventually make use of the following properties of A* under the 
weak* topology. 

Lemma 3. Define the action space A* as in (25), then: 
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(i) A* is weak* compact. 

(ii) The weak* topology on A* is metrizable by 

oo 

(27) p(g,h) = J22- k 

k=l 

where {fk, k = 1, 2, . . .} is a countable dense subset of C\. And A* is separable 
and second countable under this metric (27). 

(iii) Suppose g*(-) £ A* . If g* ^ A, then there exists a g € A that domi- 
nates g* in the sense that L(/i,g*) > L{[i,g) for all fi. Thus, the extension 
from A to A* does not incur any new admissible actions. 

Finally, we also need to make use of the following properties of the 
Kullback-Leibler loss function. 

Lemma 4. For the KL loss function L(/z, •) in (2): 

(i) L(fi,-) is lower semi- continuous on A*, that is, if {gi},g € A* and 
9i - > 9 £ A* weak* , then 

(28) liminf L{^ gi ) >L(fM,g) V/U G R p ; 

i— >oo 

(ii) L(p, ■) is strictly convex on 

(29) A* + = {g : g £ A* and L(fi, g) < oo for V^} 
for any fi€ BP . 

3.2. The main theorems. Having established Lemmas 2, 3 and 4 in Sec- 
tion 3.1, we are now ready to prove that all admissible procedures for the 
normal density prediction problem under KL loss are (generalized) Bayes 
rules. This proof consists of three steps: 

(i) All the admissible procedures are nonrandomized. 

(ii) For any admissible procedure p(-\x), there exists a sequence of pri- 
ors Mi(n) such that pMi(-\x) — > p(-\x) for almost every x under the weak* 
topology (26). 

(iii) We can find a subsequence {M^} and a limit prior M such that 
PmX'\ x ) ~ y Pm{'\x) weak* for almost every x. Therefore, p(-\x) =Pm( , \ x ) 
for a.e. x, that is, p(-\x) is a (generalized) Bayes rule. 

Theorem 3. All nonrandomized procedures form a complete class. 

Proof. Let 5 : RP — > P(^lo) be an admissible and randomized proce- 
dure, where P(Ao) denotes the space of probability distributions over ^4o- We 
first prove that 5(x) G P{A) C P(A*) for a.e. x. Suppose there exists a set K 



[g(y) - Kv)\fk{y)dy 



for any g,h € A* 
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such that K has positive measure and for each x 6 K, S(-\x) = p(-\x) £ P(A*) 
with a positive probability. Then by Lemma 2, we can find g x G P{A*) that 
satisfies L(/i,g x ) < L(p,,p(-\x)) for all fi, and therefore 5 is dominated by the 
decision rule 5 that substitutes g x for ]5(-|x). This contradicts the admissi- 
bility of 5. 

Now let p*(y\x) = E s (< x '(g(y)). It can be seen that p*(y\x) £ .A since 
<5(x) € -P(*4) C P(.4*) for a.e. x. By Lemma 4(ii) and Jensen's inequality, 

(30) L(^p*(y\x))<E s ^(L^,p(y))) = L( f i,5(y\x)) ty. 

Furthermore, strict inequality holds in (30) unless either <5(-|x) is nonran- 
domized with probability 1 or L(p,,5(y\x)) = oo, which implies that <5 can 
be dominated by a finite-risk nonrandomized procedure. Therefore, it con- 
tradicts that 5 is admissible and randomized. It then follows that the non- 
randomized procedures are a complete class. □ 

Theorem 3 shows that we can restrict attention to nonrandomized pro- 
cedures p(-\x). Next we prove that for a.e. x, all admissible procedures are 
limits of Bayes rules (5). Since the Bayes rules are also nonrandomized, this 
convergence can be evaluated with respect to the weak* topology for each 
x. 

Theorem 4. For any admissible procedure p{-\x), there exists a sequence 
of priors {Mi} supported on finite sets such that pMi(-\x) — > p(-\x) weak* for 
a.e. x under the topology (26). 

Proof. This is essentially Theorem 4A.12 of Brown (1986). There are 
some minor differences between the formulations there and here which we 
now note in order to clarify how that Theorem 4 A. 12 yields the current 
Theorem 4. The principal difference is that the action space A* in Brown 
(1986) was assumed to be Euclidean whereas here it is merely compact, 
separable, and metrizable. Because the space A*, here is compact, the one- 
point compactification {i} introduced in Brown (1986) is not needed. This 
simplifies the proof of Proposition 4A.11 there, which in our context becomes 
Theorem 3. The remainder of the proof proceeds as discussed in the text of 
the proof of Theorem 4A.12. □ 

Theorem 4 establishes that any admissible procedure p(-\x) is a limit of 
Bayes rules for a.e. x. To prove p(-\x) itself is also a (generalized) Bayes rule, 
we need to find a (possibly improper) prior M such that Pm{-\x) =p(-\x) 
for a.e. x. 

Theorem 5. The set of all generalized Bayes procedures is a complete 
class of procedures. 
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Proof. Suppose p(-\x) is an admissible procedure. Then by Theorem 
4, there exists a sequence of measures Mj supported on finite sets such that 
PMi('\x) — >p(-\x) for a.e. x under the weak* topology (26). 

Let 

/ p{x\n)Mi{dn)dx, 

\\x\\<l J 

then r.j > since p(x\n) > for all x and /U. Thus we can define a new 
sequence of measures M[ by M[ = Mi /ri. It is easy to check that p M < = 
PMi - * P weak* a.e. and that 

(31) / fp(x\fi)Ml(dfi)dx = l. 

J\\x\\<l J 

By 2.16(iv) of Brown (1986), there exists a finite limiting measure M such 
that M[^M. 

Let S be the biggest convex set that satisfies 

liminf sup / p{x\n)M' i {dfi) < oo. 

[The existence of S follows from (31).] Then by Theorem 2.17 in Brown 
(1986), for any x in the interior of S, 

(32) J p(x\n)M-(dfi) -> J p(x\n)M(dn) as i -> oo. 

In fact, we can prove that the closure S = BP . [Otherwise its complement S c 
has positive measure and at every x € S c , liminfj^oo / p(x\^)M-(dfi) = oo. 
Therefore, 

hm / p M ,{y \x)dy= hm / 

< (2™,)" P/2 Hm ^IMSm^M^ 

~ y x} woo /p(x|/i)M/(d/i) 

= 0, 

which implies /|| 2/ || <1 p(y|x) dy = and thus Rkl{h,P) = oo. This would con- 
tradict the assumed admissibility of p.] Hence, (32) holds for a.e. x. 
Furthermore, by the dominated convergence, for a.e. x and y, 



(33) J p(x\fj,)p(y\{j,)Ml(dfj,) -» J p{x\^p{y\^)M{dpL). 

Combining (32) and (33), we obtain 

= Jp(x\fi)p(y\fi)M^dfi) Jp(x\fi)p(y\fi)M(dfi) _ 

PM ^ fp(x\v)M>(dri fp(x\n)M(dri PM{Vl ) 

for a.e. x and y, so p^' also converges to Pm{v\x) under the weak* topology. 
Therefore, p = pu is a generalized Bayes procedure. □ 
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APPENDIX 

In this appendix, we provide the proofs of Lemmas 2, 3 and 4 from Section 
3.1. 

Proof of Lemma 2. (i) Suppose go = on a set with positive measure. 
Then by definition L(/i,go) = oo for any fx. So any g G A with finite risk 
dominates it and thus go is inadmissible. 

(ii) Suppose go> almost everywhere. If go > C on a set 5 with Lebesgue 
measure v{S) > 0, then a g can be constructed by truncating go on S and 
lifting it in the other areas. Notice that J Sc go(y) dy > 0, so we can define 

(34) c=- — — , 

where S c is the complement of S. It is easy to check c > 1. Let 

P5) if;- 

Obviously, g G A. For any the difference between the loss functions of go 
and g is 

p{y\fJ-)logg(y)dy- / p(y\/i) log g (y) dy 



p{y\n)logCdy+ / p(y\fi)log(cg {y))dy - / log g (y)dy 

s Js c 



p{y\^) log C + log c / p(y|/x) dy 
s Js c 

+ / P(y\v) log go(y)dy - p(y\/i) log g {y)dy 



s c 

p(y\n)logCdy + logc p{y\y)dy- \ p(y\n)logg (y) dy 
s Js c Js 

= log c - J P(y\fj) log ° 90 ^ dy 

>logc — log / p{y\^)-^pr^~ dy (Jensen's inequality) 
Js G 

> log c- log / cg {y)dy 
Js 

>0. 

The last strict inequality holds because J s go(y)dy = 1 — J Sc go{y) dy < 1. 
Therefore, g dominates go- □ 
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Proof of Lemma 3. (i) By the Banach-Alaoglu theorem, the C\ unit 
ball {g:BP — > R\ J g(y) dy < 1} is weak* compact. Also, it is easy to check 
that the bounded set {g: BP — > R\0 < g{y) < C} is closed and thus compact. 
So their intersection A* is compact. 

(ii) Because C\ a separable normed space, the weak* topology on the 
closed ball of its dual space can be metrized by (27). And since every 
compact metric space is separable and second countable, (ii) follows imme- 
diately from (i). 

(hi) Suppose g* G A* but g* A, then / g*(y) dy < 1. If / g*(y)dy = 0, its 
loss function L(fi,g*) = oo for any fj, and thus g* is inadmissible. Otherwise 
let g' = g* / J g*(y)dy, then / g'(y)dy = 1 and it is easy to check that g' 
dominates g* . Truncate g' as in (35) if necessary, and then it yields a. g £L A 
that dominates g' and therefore dominates g* . □ 

Proof of Lemma 4. (i) Suppose {gi} is a sequence of functions in A* 
and g%^> g G A* under the weak* topology. 

(a) We first consider the case where g is bounded away from 0. To prove 
that lim infj^oo L(;U, gj) > L((i,g) for all [i € R p , we only need to show 

L{^,g) - liminf L(p,gi) 

i—>oo 

(36) = limsup / p(y\n) log gi(y)dy - / p(y\fj) log g(y) dy 

i— »oo J J 

<0. 

If there exists a positive constant eq such that g > Eq a.e., then < 

p{y\v) 



is an £i function. Therefore, 



limsup / p(y\n)\oggi{y)dy - / p(y\(i) log g(y) dy 



= limsup [ p(y\fj,)log^ v dy 

i^oo J g{y) 

(37) < limsup J p(y\n) - l) dy 

g(y) 



limsup / P ^ V )^ gi{y) dy-l 
i^oo J g{y) 



t— >oo 





where the inequality follows from the fact that logs < x — 1 for any x > 0. 
This proves that the lemma holds whenever g is bounded away from 0. 
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(b) Let N = {y :y(y) = 0}. If N has positive measure, then the assump- 
tion that gi^ g under the weak* topology implies that gi(y) — > in measure 
on N. Hence 

(38) lim I p(y\n) log gi(y)dy = -oc 

by the bounded convergence theorem for convergence in measure. 

(c) Now the final possibility is that N has measure 0, but g is not bounded 
away from 0. Then for any fixed e > 0, let L(e) = {y\g(y) > e}. Thus, 

lim sup / p{y\n) log gi{y)dy 



p(y\fJ-)loggi(y)dy+ / p{y\p) log Si(y) dy 

L(e) Jlc(s) 



(39) = lim sup 



</ p(y\p)logg(y)dy + logC / p(y\fi)dy. 

JL(e) JL c {e) 

The above inequality follows from the truth of the lemma when g is bounded 
away from and the definition that gi E A* satisfies g% <C. Now let e [ 0, 
then L{e) — > BP since g > a.e. Therefore, by the bounded convergence 
theorem, 

limsup / p(y\n) log gi(y)dy < / p(y\fM) log g(y) dy + 

(40) 

= / P(y\ log g(y)dy. 

This proves (i) since 

(41) L(n,gi)= / p(y\n)logp(y\n)dy - / p(y\p)log gi{y) dy . 



(ii) Suppose gi,g 2 £ A% and y A (y|z) = Ayi + (1 - A)y 2 with < A < 1, 
then 

L(p,g\)= [ p{y\p) log ^44- dy 

5a (y) 



P(y|^) logp(y|^) dy - / p(y|^) log[Ayi(y) + (1 - A)y 2 (y)] dy 

< / p(y\fJ-)^ogp(y\n)dy - J p(y|/i)[Alogyi(y) 

+ (1- A) log 52 (y)]dy 

= A / p(y| M ) log ^4 dy + i 1 -*) f P(y\») lo § #V 

j yi(y) v y 2 (y) 

= AL(/x,3i) + (1 - \)L(fi,g 2 ), 
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where the inequality follows from Jensen's inequality. Thus, the strict con- 
vexity of L(n, •) on A* + is verified. □ 
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